Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech

نویسندگان

Omnia Abdo

Sherif Abdou

Mervat Fashal

چکیده

The present research aims to build an MSA audio-visual corpus. The corpus is annotated both phonetically and visually and dedicated to emotional speech processing studies. The building of the corpus consists of 5 main stages: speaker selection, sentences selection, recording, annotation and evaluation. 500 sentences were critically selected based on their phonemic distribution. The speaker was instructed to read the same 500 sentences with 6 emotions (HappinessSadnessFearAngerInquiry Neutral). A sample of 50 sentences was selected for annotation. The corpus evaluation modules were: audio, visual and audio –visual subjective evaluation. The corpus evaluation process showed that happy, anger and inquiry emotions were better recognized visually (94%, 96% and 96%) than audibly (63.6%, 74% and 74%) and the audio visual evaluation scores (96%, 89.6% and 80.8%). Sadness and fear emotion on the other hand were better recognized audibly (76.8% and 97.6%) than visually (58% and 78.8 %) and the audio visual evaluation scores were (65.6% and 90%).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing the Latvian Speech Recognition Corpus

In this paper the authors present the first Latvian speech corpus designed specifically for speech recognition purposes. The paper outlines the decisions made in the corpus designing process through analysis of related work on speech corpora creation for different languages. The authors provide also guidelines that were used for the creation of the Latvian speech recognition corpus. The corpus ...

متن کامل

Efficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser

Diphone synthesis is a convenient way for testing phonetic models of human speech. It allows easy manipulation of duration and pitch, therefore it is used not only for general intonation contour evaluation, but also for expressive speech synthesis. The main advantage of using MBROLA [11][9],[12],[13] is the fact that not all the diphones need to be contained in the voice to test speech models. ...

متن کامل

Design and recording of Czech speech corpus for audio-visual continuous speech recognition

In this paper we describe the design, recording, and content of a large audio-visual speech database intended for training and testing of audio-visual continuous speech recognition systems. The UWB05-HSCAVC database contains high resolution video and quality audio data suitable for experiments on audio-visual speech recognition. The corpus consists of nearly 40 hours of audiovisual records of 1...

متن کامل

The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech

In this paper, we describe and analyze a corpus of speech data that we have recorded in multiple modalities simultaneously: facial motion via optical motion capturing, tongue motion via electro-magnetic articulography, as well as conventional video and highquality audio. The corpus consists of 320 phonetically diverse sentences uttered by a male Austrian German speaker at normal, fast and slow ...

متن کامل

Evaluating an Authentic Audio-Visual Expressive Speech Corpus

This paper presents an evaluation of the acted part of an audio-visual corpus of emotional speech. This corpus is intended to collect both spontaneous and acted emotions, and then the perceptive efficiency of stimuli to carry emotional expression has to be rated. The evaluation of acted speech is presented here, and will give us a scale to measure the spontaneous expressions.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech

نویسندگان

چکیده

منابع مشابه

Designing the Latvian Speech Recognition Corpus

Efficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser

Design and recording of Czech speech corpus for audio-visual continuous speech recognition

The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech

Evaluating an Authentic Audio-Visual Expressive Speech Corpus

عنوان ژورنال:

اشتراک گذاری